While multiprocessor hardware is finally becoming ubiquitous, enticing most programmers to write parallel programs is going to be very challenging. For this reason, I believe that the main problem that confronts computer architects today is designing computer systems that help simplify parallel programming.
In this talk I will present two novel, powerful computer architecture primitives that help simplify parallel programming. The first one is Bulk — a hardware framework for performing sets of memory operations in bulk. Bulk is used as a building block to support interactions between multiple threads, enabling high-programmability environments such as high-performance sequential memory consistency, thread-level speculation, and transactional memory. The second technique is Colorama — hardware support for Data-Centric Synchronization. Colorama associates concurrency control constraints with data, providing an attractive alternative to the traditional code-centric approach of locks and transactions. Together, these two techniques offer promising directions in the critical area of novel multiprocessor architectures for programmability.